Deep neural networks are vulnerable to adversarial attacks. Ideally, a robust model shall perform well on both the perturbed training data and the unseen perturbed test data. It is found empirically that fitting perturbed training data is not hard, but generalizing to perturbed test data is quite difficult. To better understand adversarial generalization, it is of great interest to study the adversarial Rademacher complexity (ARC) of deep neural networks. However, how to bound ARC in multi-layers cases is largely unclear due to the difficulty of analyzing adversarial loss in the definition of ARC. There have been two types of attempts of ARC. One is to provide the upper bound of ARC in linear and one-hidden layer cases. However, these approaches seem hard to extend to multi-layer cases. Another is to modify the adversarial loss and provide upper bounds of Rademacher complexity on such surrogate loss in multi-layer cases. However, such variants of Rademacher complexity are not guaranteed to be bounds for meaningful robust generalization gaps (RGG). In this paper, we provide a solution to this unsolved problem. Specifically, we provide the first bound of adversarial Rademacher complexity of deep neural networks. Our approach is based on covering numbers. We provide a method to handle the robustify function classes of DNNs such that we can calculate the covering numbers. Finally, we provide experiments to study the empirical implication of our bounds and provide an analysis of poor adversarial generalization.
translated by 谷歌翻译
自Reddi等人以来。 2018年指出了亚当的分歧问题,已经设计了许多新变体以获得融合。但是,香草·亚当(Vanilla Adam)仍然非常受欢迎,并且在实践中效果很好。为什么理论和实践之间存在差距?我们指出,理论和实践的设置之间存在不匹配:Reddi等。 2018年选择亚当的超参数后选择问题,即$(\ beta_1,\ beta_2)$;虽然实际应用通常首先解决问题,然后调整$(\ beta_1,\ beta_2)$。由于这一观察,我们猜想只有当我们改变选择问题和超参数的顺序时,理论上的经验收敛才能是合理的。在这项工作中,我们确认了这一猜想。我们证明,当$ \ beta_2 $很大时,$ \ beta_1 <\ sqrt {\ beta_2} <1 $,Adam收集到关键点附近。邻居的大小是随机梯度方差的命题。在额外的条件(强烈生长条件)下,亚当收敛到关键点。随着$ \ beta_2 $的增加,我们的收敛结果可以覆盖[0,1)$中的任何$ \ beta_1 \,包括$ \ beta_1 = 0.9 $,这是深度学习库中的默认设置。我们的结果表明,亚当可以在广泛的超参数下收敛,而无需对其更新规则进行任何修改。据我们所知,我们是第一个证明这一结果的人,而没有强有力的假设,例如有限梯度。当$ \ beta_2 $很小时,我们进一步指出了一个$(\ beta_1,\ beta_2)$的大区域,亚当可以在其中偏离无限。我们的差异结果考虑与我们的收敛结果相同的设置,表明在增加$ \ beta_2 $时从差异到收敛的相变。这些正面和负面的结果可以提供有关如何调整亚当超级参数的建议。
translated by 谷歌翻译
模仿学习从专家轨迹中学习政策。尽管据信专家数据对于模仿质量至关重要,但发现一种模仿学习方法,对抗性模仿学习(AIL)可以具有出色的性能。只需仅仅在一个专家轨迹上,即使在诸如运动控制之类的任务上,AIL也可以符合专家的性能。这种现象有两个神秘的要点。首先,为什么AIL只能使用几个专家轨迹表现良好?其次,尽管计划范围的时间长,但为什么AIL仍能保持良好的性能?在本文中,我们从理论上探讨了这两个问题。对于总基于差异的ail(称为TV-ail),我们的分析显示了一个无水平的模仿差距$ \ MATHCAL O(\ {\ {\ min \ {1,\ sqrt {| \ Mathcal S |/n} \})$在从运动控制任务中抽象的一类实例上。这里$ | \ Mathcal S | $是表格Markov决策过程的状态空间大小,而$ n $是专家轨迹的数量。我们强调了界限的两个重要特征。首先,在小样本制度中,这种界限都是有意义的。其次,这一界限表明,无论计划范围如何,电视填充的模仿缝隙最多都是1。因此,这种结合可以解释经验观察。从技术上讲,我们利用了电视填充中多阶段策略优化的结构,并通过动态编程提出了新的舞台耦合分析
translated by 谷歌翻译
在许多机器学习应用程序中出现了非convex-concave min-max问题,包括最大程度地减少一组非凸函数的最大程度,并对神经网络的强大对抗训练。解决此问题的一种流行方法是梯度下降(GDA)算法,不幸的是,在非凸性的情况下可以表现出振荡。在本文中,我们引入了一种“平滑”方案,该方案可以与GDA结合以稳定振荡并确保收敛到固定溶液。我们证明,稳定的GDA算法可以实现$ O(1/\ epsilon^2)$迭代复杂性,以最大程度地减少有限的非convex函数收集的最大值。此外,平滑的GDA算法达到了$ O(1/\ epsilon^4)$ toseration复杂性,用于一般的nonconvex-concave问题。提出了这种稳定的GDA算法的扩展到多块情况。据我们所知,这是第一个实现$ o(1/\ epsilon^2)$的算法,用于一类NonConvex-Concave问题。我们说明了稳定的GDA算法在健壮训练中的实际效率。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译
Although deep learning has made remarkable progress in processing various types of data such as images, text and speech, they are known to be susceptible to adversarial perturbations: perturbations specifically designed and added to the input to make the target model produce erroneous output. Most of the existing studies on generating adversarial perturbations attempt to perturb the entire input indiscriminately. In this paper, we propose ExploreADV, a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks, allowing users to explore various kinds of adversarial examples as needed. We adapt and combine two existing boundary attack methods, DeepFool and Brendel\&Bethge Attack, and propose a mask-constrained adversarial attack system, which generates minimal adversarial perturbations under the pixel-level constraints, namely ``mask-constraints''. We study different ways of generating such mask-constraints considering the variance and importance of the input features, and show that our adversarial attack system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks. We demonstrate our system to be effective based on extensive experiments and user study.
translated by 谷歌翻译